智能论文笔记

PyPop7: A Pure-Python Library for Population-Based Black-Box Optimization

Qiqi Duan , Guochen Zhou , Chang Shao , Zhuowei Wang , Mingyang Feng , Yijun Yang , Qi Zhao , Yuhui Shi

分类：神经与进化计算

2022-12-12

In this paper, we present a pure-Python open-source library, called PyPop7, for black-box optimization (BBO). It provides a unified and modular interface for more than 60 versions and variants of different black-box optimization algorithms, particularly population-based optimizers, which can be classified into 12 popular families: Evolution Strategies (ES), Natural Evolution Strategies (NES), Estimation of Distribution Algorithms (EDA), Cross-Entropy Method (CEM), Differential Evolution (DE), Particle Swarm Optimizer (PSO), Cooperative Coevolution (CC), Simulated Annealing (SA), Genetic Algorithms (GA), Evolutionary Programming (EP), Pattern Search (PS), and Random Search (RS). It also provides many examples, interesting tutorials, and full-fledged API documentations. Through this new library, we expect to provide a well-designed platform for benchmarking of optimizers and promote their real-world applications, especially for large-scale BBO. Its source code and documentations are available at https://github.com/Evolutionary-Intelligence/pypop and https://pypop.readthedocs.io/en/latest, respectively.

translated by 谷歌翻译

FakeEdge: Alleviate Dataset Shift in Link Prediction

Kaiwen Dong , Yijun Tian , Zhichun Guo , Yang Yang , Nitesh V. Chawla

分类：机器学习 | (统计)机器学习

2022-11-29

Link prediction is a crucial problem in graph-structured data. Due to the recent success of graph neural networks (GNNs), a variety of GNN-based models were proposed to tackle the link prediction task. Specifically, GNNs leverage the message passing paradigm to obtain node representation, which relies on link connectivity. However, in a link prediction task, links in the training set are always present while ones in the testing set are not yet formed, resulting in a discrepancy of the connectivity pattern and bias of the learned representation. It leads to a problem of dataset shift which degrades the model performance. In this paper, we first identify the dataset shift problem in the link prediction task and provide theoretical analyses on how existing link prediction methods are vulnerable to it. We then propose FakeEdge, a model-agnostic technique, to address the problem by mitigating the graph topological gap between training and testing sets. Extensive experiments demonstrate the applicability and superiority of FakeEdge on multiple datasets across various domains.

translated by 谷歌翻译

Be Your Own Neighborhood: Detecting Adversarial Example by the Neighborhood Relations Built on Self-Supervised Learning

Zhiyuan He , Yijun Yang , Pin-Yu Chen , Qiang Xu , Tsung-Yi Ho

分类：机器学习

2022-08-31

深度神经网络（DNNS）在各个领域都取得了出色的性能。但是，DNNS对对抗性示例（AE）的脆弱性阻碍了他们的部署到关键的安全应用程序。本文提出了一个新颖的AE检测框架，以值得信赖的预测为止。除了通过区分AE的异常关系与其增强版本（即邻居）与两个前景：表示相似性和标签一致性来区分检测。与监督的学习模型相比，使用现成的自我监督学习（SSL）模型用于提取表示形式，并预测其高度信息代表能力的标签。对于干净的样本，它们的表示和预测与邻居密切一致，而AE的邻居差异很大。此外，我们解释了这一观察结果，并表明，通过利用这种差异可以有效地检测到AE。我们为超越的有效性建立了严格的理由。此外，作为一种插件模型，超越的范围可以轻松与受过对抗训练的分类器（ATC）合作，从而实现最先进的（SOTA）鲁棒性精度。实验结果表明，超越表现的基线较大，尤其是在自适应攻击下。在SSL上建立的强大关系网络的授权下，我们发现超出了检测能力和速度方面优于基准。我们的代码将公开可用。

translated by 谷歌翻译

Out-of-Distribution Detection with Semantic Mismatch under Masking

Yijun Yang , Ruiyuan Gao , Qiang Xu

分类：计算机视觉 | 人工智能

2022-07-31

本文提出了一个新颖的分布（OOD）检测框架，名为MoodCat用于图像分类器。MoodCat掩盖了输入图像的随机部分，并使用生成模型将蒙版图像合成为在分类结果条件下的新图像中。然后，它计算原始图像与合成图像之间的语义差异。与现有的解决方案相比，MoodCat自然会使用拟议的面具和条件合成策略来学习分布数据的语义信息，这对于识别OOD至关重要。实验结果表明，MoodCat的表现优于最先进的OOD检测解决方案。

translated by 谷歌翻译

Sample Efficiency of Data Augmentation Consistency Regularization

Shuo Yang , Yijun Dong , Rachel Ward , Inderjit S. Dhillon , Sujay Sanghavi , Qi Lei

分类：机器学习

2022-02-24

数据增强在大型神经网络的培训中很受欢迎；但是，目前，关于如何使用增强数据的不同算法选择之间没有明确的理论比较。在本文中，我们朝这个方向迈出了一步 - 我们首先提出了对线性回归的简单新颖的分析，该分析具有标签不变性增强，这表明数据增强一致性（DAC）本质上比对增强数据的经验风险最小化更为有效（DA- erm）。然后将分析扩展到误指定的增强（即更改标签的增强），这再次证明了DAC比DA-MERM的优点。此外，我们将分析扩展到非线性模型（例如神经网络）并呈现泛化范围。最后，我们使用CIFAR-100和WIDERESNET进行DAC和DA-MER之间的DAC和DA-MER之间进行干净和苹果对比较的实验；这些共同证明了DAC的效果。

translated by 谷歌翻译

YACLC: A Chinese Learner Corpus with Multidimensional Annotation

Yingying Wang , Cunliang Kong , Liner Yang , Yijun Wang , Xiaorong Lu , Renfen Hu , Shan He , Zhenghao Liu , Yun Chen , Erhong Yang

分类：自然语言处理

2021-12-30

学习者语料库收集L2学习者产生的语言数据，即第二或外语学习者。这种资源与第二语言采集研究，外语教学和自动语法纠错有关。但是，几乎没有焦点汉语作为外语（CFL）学习者的学习者语料库。因此，我们建议构建大规模的多维注释的中国学习者语料库。要构建语料库，我们首先获得CFL学习者生成的大量富有的富主题文本。然后我们设计一个注释方案，包括句子可接受性得分以及语法错误和基于流畅的校正。我们构建一个众群平台，有效地执行注释（https://yaclc.wenmind.net）。我们命名语料库yaclc（又一个中国学习者语料库）并将其释放为Cuge基准（http://cuge.baai.ac.cn）。通过分析语料库中的原始句子和注释，我们发现Yaclc具有相当大的尺寸和非常高的注释质量。我们希望这项语料库能够进一步加强中国国际教育和中国自动语法纠错的研究。

translated by 谷歌翻译

Domain Generalization for Medical Image Segmentation via Hierarchical Consistency Regularization

Yijun Yang , Shujun Wang , Lei Zhu , Pheng-Ann Heng , Lequan Yu

分类：计算机视觉

2021-09-13

现代深层神经网络在部署到现实世界应用程序时努力转移知识并跨越不同领域的知识。当前，引入了域的概括（DG），以从多个域中学习通用表示，以提高看不见的域的网络泛化能力。但是，以前的DG方法仅关注数据级的一致性方案，而无需考虑不同一致性方案之间的协同正则化。在本文中，我们通过通过协同整合外在的一致性和内在的一致性来提出一个新型的域概括（HCDG）层次一致性框架。特别是对于外部一致性，我们利用跨多个源域的知识来强制数据级的一致性。为了更好地提高这种一致性，我们将新型的高斯混合策略设计为基于傅立叶的数据增强，称为domainup。对于固有的一致性，我们在双重任务方案下对同一实例执行任务级的一致性。我们在两个医学图像分割任务上评估了提出的HCDG框架，即对眼底图像和前列腺MRI分割的视频杯/圆盘分割。广泛的实验结果表明了我们的HCDG框架的有效性和多功能性。

translated by 谷歌翻译

NRTR: Neuron Reconstruction with Transformer from 3D Optical Microscopy Images

Yijun Wang , Rui Lang , Rui Li , Junsong Zhang

分类：计算机视觉 | 人工智能

2022-12-08

The neuron reconstruction from raw Optical Microscopy (OM) image stacks is the basis of neuroscience. Manual annotation and semi-automatic neuron tracing algorithms are time-consuming and inefficient. Existing deep learning neuron reconstruction methods, although demonstrating exemplary performance, greatly demand complex rule-based components. Therefore, a crucial challenge is designing an end-to-end neuron reconstruction method that makes the overall framework simpler and model training easier. We propose a Neuron Reconstruction Transformer (NRTR) that, discarding the complex rule-based components, views neuron reconstruction as a direct set-prediction problem. To the best of our knowledge, NRTR is the first image-to-set deep learning model for end-to-end neuron reconstruction. In experiments using the BigNeuron and VISoR-40 datasets, NRTR achieves excellent neuron reconstruction results for comprehensive benchmarks and outperforms competitive baselines. Results of extensive experiments indicate that NRTR is effective at showing that neuron reconstruction is viewed as a set-prediction problem, which makes end-to-end model training available.

translated by 谷歌翻译

Contrastive Learning for Diverse Disentangled Foreground Generation

Yuheng Li , Yijun Li , Jingwan Lu , Eli Shechtman , Yong Jae Lee , Krishna Kumar Singh

分类：计算机视觉

2022-11-04

We introduce a new method for diverse foreground generation with explicit control over various factors. Existing image inpainting based foreground generation methods often struggle to generate diverse results and rarely allow users to explicitly control specific factors of variation (e.g., varying the facial identity or expression for face inpainting results). We leverage contrastive learning with latent codes to generate diverse foreground results for the same masked input. Specifically, we define two sets of latent codes, where one controls a pre-defined factor (``known''), and the other controls the remaining factors (``unknown''). The sampled latent codes from the two sets jointly bi-modulate the convolution kernels to guide the generator to synthesize diverse results. Experiments demonstrate the superiority of our method over state-of-the-arts in result diversity and generation controllability.

translated by 谷歌翻译

Few Clean Instances Help Denoising Distant Supervision

Yufang Liu , Ziyin Huang , Yijun Wang , Changzhi Sun , Man Lan , Yuanbin Wu , Xiaofeng Mou , Ding Wang

分类：自然语言处理

2022-09-14

现有的远处监督的关系提取器通常依靠嘈杂的数据进行模型培训和评估，这可能导致垃圾堆放系统。为了减轻问题，我们研究了小型清洁数据集是否可以帮助提高远距离监督模型的质量。我们表明，除了对模型进行更具说服力的评估外，一个小的清洁数据集还可以帮助我们构建更强大的Denoising模型。具体而言，我们提出了一个基于影响函数的清洁实例选择的新标准。它收集了样本级别的证据，以识别良好实例（这比损失级别的证据更具信息性）。我们还提出了一种教师实习机制，以控制自举套件时中间结果的纯度。整个方法是模型不合时宜的，并且在denoising Real（NYT）和合成噪声数据集上都表现出强烈的性能。

translated by 谷歌翻译